Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words
نویسندگان
چکیده
We present a novel approach for discovering word categories, sets of words sharing a significant aspect of their meaning. We utilize meta-patterns of highfrequency words and content words in order to discover pattern candidates. Symmetric patterns are then identified using graph-based measures, and word categories are created based on graph clique sets. Our method is the first pattern-based method that requires no corpus annotation or manually provided seed patterns or words. We evaluate our algorithm on very large corpora in two languages, using both human judgments and WordNetbased evaluation. Our fully unsupervised results are superior to previous work that used a POS tagged corpus, and computation time for huge corpora are orders of magnitude faster than previously reported.
منابع مشابه
Unsupervised Concept Discovery In Hebrew Using Simple Unsupervised Word Prefix Segmentation for Hebrew and Arabic
Fully unsupervised pattern-based methods for discovery of word categories have been proven to be useful in several languages. The majority of these methods rely on the existence of function words as separate text units. However, in morphology-rich languages, in particular Semitic languages such as Hebrew and Arabic, the equivalents of such function words are usually written as morphemes attache...
متن کاملA Young EFL Learner’s Lexical Development through Different Input and Output Frequency Patterns
The present study was undertaken to investigate the effects of varying frequency patterns (FPs) of words on the productive acquisition of a young EFL learner in a home setting. Target words were presented to the learner using games and role plays. They were subsequently traced for their frequencies in input and output. Eighteen immediate tests and delayed tests were administered to measure the ...
متن کاملSuperior and Efficient Fully Unsupervised Pattern-based Concept Acquisition Using an Unsupervised Parser
Sets of lexical items sharing a significant aspect of their meaning (concepts) are fundamental for linguistics and NLP. Unsupervised concept acquisition algorithms have been shown to produce good results, and are preferable over manual preparation of concept resources, which is labor intensive, error prone and somewhat arbitrary. Some existing concept mining methods utilize supervised language-...
متن کاملMinimally Supervised Classification to Semantic Categories using Automatically Acquired Symmetric Patterns
Classifying nouns into semantic categories (e.g., animals, food) is an important line of research in both cognitive science and natural language processing. We present a minimally supervised model for noun classification, which uses symmetric patterns (e.g., “X and Y”) and an iterative variant of the k-Nearest Neighbors algorithm. Unlike most previous works, we do not use a predefined set of sy...
متن کاملAn evaluation of graph clustering methods for unsupervised term discovery
Unsupervised term discovery (UTD) is the task of automatically identifying the repeated words and phrases in a collection of speech audio without relying on any language-specific resources. While the solution space for the task is far from fully explored, the dominant approach to date decomposes the discovery problem into two steps, where (i) segmental dynamic time warping is used to search the...
متن کامل